In [1]:
import warnings
warnings.filterwarnings('ignore')

Gradient Boosting Model¶

In [2]:
import pandas as pd

df = pd.read_csv("Caravan.csv")
df = df.iloc[:,1:]
df.head(2)
Out[2]:
MOSTYPE MAANTHUI MGEMOMV MGEMLEEF MOSHOOFD MGODRK MGODPR MGODOV MGODGE MRELGE ... APERSONG AGEZONG AWAOREG ABRAND AZEILPL APLEZIER AFIETS AINBOED ABYSTAND Purchase
0 33 1 3 2 8 0 5 1 3 7 ... 0 0 0 1 0 0 0 0 0 No
1 37 1 2 2 8 1 4 1 4 6 ... 0 0 0 1 0 0 0 0 0 No

2 rows × 86 columns

Create a 80/20 split with a random state of 19. This will ensure reproducibility.

In [3]:
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, ConfusionMatrixDisplay
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from scipy.stats import randint
In [4]:
df['Purchase'] = df['Purchase'].map({'No':0,'Yes':1})
In [5]:
df.head(2)
Out[5]:
MOSTYPE MAANTHUI MGEMOMV MGEMLEEF MOSHOOFD MGODRK MGODPR MGODOV MGODGE MRELGE ... APERSONG AGEZONG AWAOREG ABRAND AZEILPL APLEZIER AFIETS AINBOED ABYSTAND Purchase
0 33 1 3 2 8 0 5 1 3 7 ... 0 0 0 1 0 0 0 0 0 0
1 37 1 2 2 8 1 4 1 4 6 ... 0 0 0 1 0 0 0 0 0 0

2 rows × 86 columns

In [6]:
# Split the data into features (X) and target (y)
X = df.drop('Purchase', axis=1)
y = df['Purchase']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=19)

Fit a boosting model to the training data with Purchase as the outcome variable and the remaining variables as predictors. Try with 1000 trees and a learning rate of 0.01. Which predictors appear to be the most important? (Hint: Use the GradientBoostingClassifier package and feature_importances_ atrribute)

In [7]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import GradientBoostingClassifier
In [8]:
gradient_booster = GradientBoostingClassifier(n_estimators=1000, learning_rate=0.01, random_state = 19)
In [9]:
gradient_booster.fit(X_train,y_train)
Out[9]:
GradientBoostingClassifier(learning_rate=0.01, n_estimators=1000,
                           random_state=19)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GradientBoostingClassifier(learning_rate=0.01, n_estimators=1000,
                           random_state=19)
In [10]:
from sklearn.inspection import permutation_importance
from sklearn.metrics import mean_squared_error
In [11]:
feat_importances = pd.Series(gradient_booster.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
Out[11]:
<AxesSubplot:>

The top 3 predictors are PPERSAUT, PBRAND and PPLEZIER.

Use the boosting model to predict the outcome variable on the test set. Predict that a person will make purchase if the estimated probability of purchase is greater than 25%. Create a confusion matrix.

In [12]:
y_predict_prob = gradient_booster.predict_proba(X_test)
In [13]:
y_predict_prob
Out[13]:
array([[0.97859171, 0.02140829],
       [0.82421339, 0.17578661],
       [0.9143759 , 0.0856241 ],
       ...,
       [0.98082219, 0.01917781],
       [0.85329489, 0.14670511],
       [0.98599035, 0.01400965]])
In [14]:
y_predict_prob_class_1 = y_predict_prob[:,1] #This is the prob of purchase
In [15]:
y_predict_prob_class_1
Out[15]:
array([0.02140829, 0.17578661, 0.0856241 , ..., 0.01917781, 0.14670511,
       0.01400965])
In [16]:
y_predict_class = [1 if prob > 0.25 else 0 for prob in y_predict_prob_class_1]
In [17]:
confusion_matrix(y_test, y_predict_class)
Out[17]:
array([[1068,   31],
       [  57,    9]])
In [18]:
boosting_score = accuracy_score(y_test, y_predict_class)
boosting_score
Out[18]:
0.9244635193133047

What fraction of the people predicted to make a purchase do in fact make a purchase?

In [19]:
9/(31+9)
Out[19]:
0.225

22.5% of the people predicted to make a purchase do in fact make a purchase

How does this result compare with results if you apply KNN, logistic regression, and Random Forest? Include your results in a table.

In [20]:
#KNN
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
y_predict_prob_knn = knn.predict_proba(X_test)
y_predict_prob_class_1_knn = y_predict_prob_knn[:,1]
y_predict_class_knn = [1 if prob > 0.25 else 0 for prob in y_predict_prob_class_1_knn]
knn_score = accuracy_score(y_test, y_predict_class_knn)
knn_score
Out[20]:
0.8755364806866953
In [21]:
#logistic regression
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 19)
classifier.fit(X_train, y_train)
y_predict_prob_lg = classifier.predict_proba(X_test)
y_predict_prob_class_1_lg = y_predict_prob_lg[:,1]
y_predict_class_lg = [1 if prob > 0.25 else 0 for prob in y_predict_prob_class_1_lg]
lg_score = accuracy_score(y_test, y_predict_class_lg)
lg_score
Out[21]:
0.927038626609442
In [22]:
#random forest
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state = 19)
rf.fit(X_train, y_train)
y_predict_prob_rf = rf.predict_proba(X_test)
y_predict_prob_class_1_rf = y_predict_prob_rf[:,1]
y_predict_class_rf = [1 if prob > 0.25 else 0 for prob in y_predict_prob_class_1_rf]
rf_score = accuracy_score(y_test, y_predict_class_rf)
rf_score
Out[22]:
0.9021459227467811
In [23]:
data = {'Models': ['Boosting', 'KNN', 'Logistic Regresion', 'Ramdom Forest'],
        'Accuracy': [boosting_score, knn_score, lg_score, rf_score]
        }
table = pd.DataFrame(data)

print(table.sort_values("Accuracy"))
               Models  Accuracy
1                 KNN  0.875536
3       Ramdom Forest  0.902146
0            Boosting  0.924464
2  Logistic Regresion  0.927039

Classification Tree Pruning¶

In [24]:
df2 = pd.read_csv("OJ.csv")
df2.head()
Out[24]:
Unnamed: 0 Purchase WeekofPurchase StoreID PriceCH PriceMM DiscCH DiscMM SpecialCH SpecialMM LoyalCH SalePriceMM SalePriceCH PriceDiff Store7 PctDiscMM PctDiscCH ListPriceDiff STORE
0 1 CH 237 1 1.75 1.99 0.00 0.0 0 0 0.500000 1.99 1.75 0.24 No 0.000000 0.000000 0.24 1
1 2 CH 239 1 1.75 1.99 0.00 0.3 0 1 0.600000 1.69 1.75 -0.06 No 0.150754 0.000000 0.24 1
2 3 CH 245 1 1.86 2.09 0.17 0.0 0 0 0.680000 2.09 1.69 0.40 No 0.000000 0.091398 0.23 1
3 4 MM 227 1 1.69 1.69 0.00 0.0 0 0 0.400000 1.69 1.69 0.00 No 0.000000 0.000000 0.00 1
4 5 CH 228 7 1.69 1.69 0.00 0.0 0 0 0.956535 1.69 1.69 0.00 Yes 0.000000 0.000000 0.00 0
In [25]:
df2 = df2.iloc[:,1:]
df2['Purchase'] = df2['Purchase'].map({'CH':0,'MM':1})
df2['Store7'] = df2['Store7'].map({'No':0,'Yes':1})
df2.head()
Out[25]:
Purchase WeekofPurchase StoreID PriceCH PriceMM DiscCH DiscMM SpecialCH SpecialMM LoyalCH SalePriceMM SalePriceCH PriceDiff Store7 PctDiscMM PctDiscCH ListPriceDiff STORE
0 0 237 1 1.75 1.99 0.00 0.0 0 0 0.500000 1.99 1.75 0.24 0 0.000000 0.000000 0.24 1
1 0 239 1 1.75 1.99 0.00 0.3 0 1 0.600000 1.69 1.75 -0.06 0 0.150754 0.000000 0.24 1
2 0 245 1 1.86 2.09 0.17 0.0 0 0 0.680000 2.09 1.69 0.40 0 0.000000 0.091398 0.23 1
3 1 227 1 1.69 1.69 0.00 0.0 0 0 0.400000 1.69 1.69 0.00 0 0.000000 0.000000 0.00 1
4 0 228 7 1.69 1.69 0.00 0.0 0 0 0.956535 1.69 1.69 0.00 1 0.000000 0.000000 0.00 0

Create a training set containing 70\% of the observations in the data. The test set containing the remaining 30\% observations.

In [26]:
# Split the data into features (X) and target (y)
X = df2.drop('Purchase', axis=1)
y = df2['Purchase']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Fit a tree to the training data, with Purchase as the response and the other variables as the predictors. Use the get_params() function to produce summary statistics about the tree, and describe the results obtained.

What is the training error rate? How many terminal nodes does the tree have?

In [27]:
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
In [28]:
clf = DecisionTreeClassifier(random_state = 42)
clf.fit(X_train, y_train)
clf.get_params()
Out[28]:
{'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'random_state': 42,
 'splitter': 'best'}
In [29]:
predictions = clf.predict(X_train)
from sklearn.metrics import accuracy_score
training_error_rate = 1- accuracy_score(y_train, predictions)
print(training_error_rate) #training error rate
0.008010680907877155
In [30]:
clf.get_n_leaves() #There are 137 leaves/terminal nodes in the tree
Out[30]:
137

Plot the tree and interpret the results. (Hint: use the dtreeviz package in Python)

In [31]:
#!pip install graphviz
In [32]:
import graphviz
In [33]:
#!pip install dtreeviz
In [34]:
import dtreeviz
In [35]:
viz_model = dtreeviz.model(clf,
                           X_train, y_train,
                           feature_names=X_train.columns.tolist(),
                           target_name='Purchase')

viz_model.view() 
Out[35]:
G cluster_legend node13 2025-07-28T20:21:54.475278 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf14 2025-07-28T20:22:30.914429 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node13->leaf14 leaf15 2025-07-28T20:22:31.020658 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node13->leaf15 leaf16 2025-07-28T20:22:31.126417 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node12 2025-07-28T20:21:54.850867 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node12->node13 node12->leaf16 leaf17 2025-07-28T20:22:31.231901 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node11 2025-07-28T20:21:55.146705 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node11->node12 node11->leaf17 node9 2025-07-28T20:21:55.444685 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node9->node11 leaf10 2025-07-28T20:22:30.826250 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node9->leaf10 leaf18 2025-07-28T20:22:31.330260 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node8 2025-07-28T20:21:55.743797 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node8->node9 node8->leaf18 node6 2025-07-28T20:21:56.032100 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node6->node8 leaf7 2025-07-28T20:22:30.723622 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node6->leaf7 node4 2025-07-28T20:21:56.315790 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node4->node6 node19 2025-07-28T20:21:57.188420 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf5 2025-07-28T20:22:30.615246 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node4->leaf5 node23 2025-07-28T20:21:56.602936 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf24 2025-07-28T20:22:31.613428 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node23->leaf24 leaf25 2025-07-28T20:22:31.686799 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node23->leaf25 node21 2025-07-28T20:21:56.897950 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node21->node23 leaf22 2025-07-28T20:22:31.528339 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node21->leaf22 node19->node21 leaf20 2025-07-28T20:22:31.426969 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node19->leaf20 node3 2025-07-28T20:21:57.468875 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node3->node4 node3->node19 node26 2025-07-28T20:22:01.350880 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node32 2025-07-28T20:21:57.760635 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf33 2025-07-28T20:22:31.839108 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node32->leaf33 leaf34 2025-07-28T20:22:31.917611 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node32->leaf34 leaf35 2025-07-28T20:22:31.991397 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node31 2025-07-28T20:21:58.194262 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node31->node32 node31->leaf35 node29 2025-07-28T20:21:58.499380 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node29->node31 node36 2025-07-28T20:21:59.435279 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf30 2025-07-28T20:22:31.756718 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node29->leaf30 node39 2025-07-28T20:21:58.822496 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf40 2025-07-28T20:22:32.146394 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node39->leaf40 leaf41 2025-07-28T20:22:32.217947 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node39->leaf41 leaf42 2025-07-28T20:22:32.297733 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node38 2025-07-28T20:21:59.122084 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node38->node39 node38->leaf42 node36->node38 leaf37 2025-07-28T20:22:32.069831 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node36->leaf37 node28 2025-07-28T20:21:59.726651 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node28->node29 node28->node36 node43 2025-07-28T20:21:59.984056 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf44 2025-07-28T20:22:32.377038 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node43->leaf44 leaf45 2025-07-28T20:22:32.454747 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node43->leaf45 node27 2025-07-28T20:22:00.232395 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node27->node28 node27->node43 node46 2025-07-28T20:22:01.128219 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node52 2025-07-28T20:22:00.462700 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf53 2025-07-28T20:22:32.770053 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node52->leaf53 leaf54 2025-07-28T20:22:32.854583 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node52->leaf54 node50 2025-07-28T20:22:00.700507 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node50->node52 leaf51 2025-07-28T20:22:32.688439 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node50->leaf51 node48 2025-07-28T20:22:00.908792 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node48->node50 leaf49 2025-07-28T20:22:32.613153 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node48->leaf49 node46->node48 leaf47 2025-07-28T20:22:32.532005 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node46->leaf47 node26->node27 node26->node46 node2 2025-07-28T20:22:01.570687 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node2->node3 node2->node26 node55 2025-07-28T20:22:15.685706 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node64 2025-07-28T20:22:01.930977 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf65 2025-07-28T20:22:33.158909 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node64->leaf65 leaf66 2025-07-28T20:22:33.229332 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node64->leaf66 node62 2025-07-28T20:22:02.152222 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node62->node64 leaf63 2025-07-28T20:22:33.085880 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node62->leaf63 node60 2025-07-28T20:22:02.371137 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node60->node62 leaf61 2025-07-28T20:22:33.008528 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node60->leaf61 node58 2025-07-28T20:22:02.580020 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node58->node60 node67 2025-07-28T20:22:03.965723 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf59 2025-07-28T20:22:32.929058 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node58->leaf59 node73 2025-07-28T20:22:02.803091 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf74 2025-07-28T20:22:33.463881 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node73->leaf74 leaf75 2025-07-28T20:22:33.534271 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node73->leaf75 node71 2025-07-28T20:22:03.019932 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node71->node73 node76 2025-07-28T20:22:03.259516 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf72 2025-07-28T20:22:33.378453 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node71->leaf72 leaf77 2025-07-28T20:22:33.607823 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node76->leaf77 leaf78 2025-07-28T20:22:33.678364 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node76->leaf78 node70 2025-07-28T20:22:03.493971 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node70->node71 node70->node76 leaf79 2025-07-28T20:22:33.754253 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node69 2025-07-28T20:22:03.713853 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node69->node70 node69->leaf79 node67->node69 leaf68 2025-07-28T20:22:33.297301 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node67->leaf68 node57 2025-07-28T20:22:04.246382 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node57->node58 node57->node67 node80 2025-07-28T20:22:06.034216 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node82 2025-07-28T20:22:04.467358 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf83 2025-07-28T20:22:33.836183 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node82->leaf83 leaf84 2025-07-28T20:22:33.915246 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node82->leaf84 leaf85 2025-07-28T20:22:33.992518 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node81 2025-07-28T20:22:04.704622 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node81->node82 node86 2025-07-28T20:22:05.624042 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node81->leaf85 node90 2025-07-28T20:22:04.931956 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf91 2025-07-28T20:22:34.139568 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node90->leaf91 leaf92 2025-07-28T20:22:34.218355 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node90->leaf92 node88 2025-07-28T20:22:05.160660 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node88->node90 leaf89 2025-07-28T20:22:34.073333 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node88->leaf89 leaf93 2025-07-28T20:22:34.297733 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node87 2025-07-28T20:22:05.400231 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node87->node88 node87->leaf93 leaf94 2025-07-28T20:22:34.372239 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node86->node87 node86->leaf94 node80->node81 node80->node86 node56 2025-07-28T20:22:06.268986 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node56->node57 node56->node80 node95 2025-07-28T20:22:15.372524 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node97 2025-07-28T20:22:06.520940 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node100 2025-07-28T20:22:10.380615 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf98 2025-07-28T20:22:34.435988 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node97->leaf98 leaf99 2025-07-28T20:22:34.514710 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node97->leaf99 node104 2025-07-28T20:22:06.806148 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf105 2025-07-28T20:22:34.672890 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node104->leaf105 leaf106 2025-07-28T20:22:34.760378 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node104->leaf106 node102 2025-07-28T20:22:07.056386 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node102->node104 node107 2025-07-28T20:22:07.535319 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf103 2025-07-28T20:22:34.590070 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node102->leaf103 node108 2025-07-28T20:22:07.300157 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf109 2025-07-28T20:22:34.845022 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node108->leaf109 leaf110 2025-07-28T20:22:34.928816 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node108->leaf110 leaf111 2025-07-28T20:22:35.004902 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node107->node108 node107->leaf111 node101 2025-07-28T20:22:07.755079 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node101->node102 node101->node107 node112 2025-07-28T20:22:10.160535 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node113 2025-07-28T20:22:07.995098 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node116 2025-07-28T20:22:09.925286 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf114 2025-07-28T20:22:35.081354 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node113->leaf114 leaf115 2025-07-28T20:22:35.156039 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node113->leaf115 node123 2025-07-28T20:22:08.224054 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf124 2025-07-28T20:22:35.469652 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node123->leaf124 leaf125 2025-07-28T20:22:35.546687 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node123->leaf125 node121 2025-07-28T20:22:08.449291 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node121->node123 leaf122 2025-07-28T20:22:35.385417 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node121->leaf122 node119 2025-07-28T20:22:08.724187 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node119->node121 node126 2025-07-28T20:22:09.451504 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf120 2025-07-28T20:22:35.312692 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node119->leaf120 node130 2025-07-28T20:22:08.986422 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf131 2025-07-28T20:22:35.778722 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node130->leaf131 leaf132 2025-07-28T20:22:35.857737 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node130->leaf132 node128 2025-07-28T20:22:09.228285 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node128->node130 leaf129 2025-07-28T20:22:35.698768 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node128->leaf129 node126->node128 leaf127 2025-07-28T20:22:35.620626 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node126->leaf127 node118 2025-07-28T20:22:09.689734 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node118->node119 node118->node126 node116->node118 leaf117 2025-07-28T20:22:35.231741 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node116->leaf117 node112->node113 node112->node116 node100->node101 node100->node112 node96 2025-07-28T20:22:10.663726 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node96->node97 node96->node100 node133 2025-07-28T20:22:15.070246 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node138 2025-07-28T20:22:10.959466 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node141 2025-07-28T20:22:11.834020 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf139 2025-07-28T20:22:35.926617 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node138->leaf139 leaf140 2025-07-28T20:22:36.524590 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node138->leaf140 node143 2025-07-28T20:22:11.260095 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf144 2025-07-28T20:22:36.674663 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node143->leaf144 leaf145 2025-07-28T20:22:36.756863 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node143->leaf145 node141->node143 leaf142 2025-07-28T20:22:36.600597 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node141->leaf142 node137 2025-07-28T20:22:12.115635 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node137->node138 node137->node141 leaf146 2025-07-28T20:22:36.837900 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node136 2025-07-28T20:22:12.392393 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node136->node137 node147 2025-07-28T20:22:14.142512 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node136->leaf146 node156 2025-07-28T20:22:12.692988 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf157 2025-07-28T20:22:37.232850 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node156->leaf157 leaf158 2025-07-28T20:22:37.311098 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node156->leaf158 node154 2025-07-28T20:22:12.971058 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node154->node156 leaf155 2025-07-28T20:22:37.155515 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node154->leaf155 node152 2025-07-28T20:22:13.256722 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node152->node154 leaf153 2025-07-28T20:22:37.071900 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node152->leaf153 node150 2025-07-28T20:22:13.572559 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node150->node152 leaf151 2025-07-28T20:22:36.995152 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node150->leaf151 leaf159 2025-07-28T20:22:37.388528 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node149 2025-07-28T20:22:13.866239 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node149->node150 node149->leaf159 node147->node149 leaf148 2025-07-28T20:22:36.915785 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node147->leaf148 node135 2025-07-28T20:22:14.423290 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node135->node136 node135->node147 leaf160 2025-07-28T20:22:37.460278 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node134 2025-07-28T20:22:14.757239 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node134->node135 node134->leaf160 leaf161 2025-07-28T20:22:37.533481 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node133->node134 node133->leaf161 node95->node96 node95->node133 node55->node56 node55->node95 node1 2025-07-28T20:22:15.952239 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node1->node2 node1->node55 node162 2025-07-28T20:22:30.140550 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node167 2025-07-28T20:22:16.170053 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf168 2025-07-28T20:22:37.692429 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node167->leaf168 leaf169 2025-07-28T20:22:37.764495 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node167->leaf169 node165 2025-07-28T20:22:16.390666 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node165->node167 leaf166 2025-07-28T20:22:37.610412 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node165->leaf166 leaf170 2025-07-28T20:22:37.847724 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node164 2025-07-28T20:22:16.620562 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node164->node165 node164->leaf170 leaf171 2025-07-28T20:22:37.923824 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node163 2025-07-28T20:22:16.851682 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node163->node164 node172 2025-07-28T20:22:29.848908 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node163->leaf171 node176 2025-07-28T20:22:17.088972 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf177 2025-07-28T20:22:38.078320 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node176->leaf177 leaf178 2025-07-28T20:22:38.155805 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node176->leaf178 leaf179 2025-07-28T20:22:38.233043 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node175 2025-07-28T20:22:17.321916 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node175->node176 node175->leaf179 node173 2025-07-28T20:22:17.554881 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node173->node175 node180 2025-07-28T20:22:29.523349 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf174 2025-07-28T20:22:37.999356 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node173->leaf174 node184 2025-07-28T20:22:17.783872 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf185 2025-07-28T20:22:38.390996 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node184->leaf185 leaf186 2025-07-28T20:22:38.465138 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node184->leaf186 node182 2025-07-28T20:22:18.020103 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node182->node184 node187 2025-07-28T20:22:24.093092 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf183 2025-07-28T20:22:38.308876 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node182->leaf183 node193 2025-07-28T20:22:18.243668 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf194 2025-07-28T20:22:38.625804 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node193->leaf194 leaf195 2025-07-28T20:22:38.733778 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node193->leaf195 leaf196 2025-07-28T20:22:38.815076 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node192 2025-07-28T20:22:18.480416 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node192->node193 node192->leaf196 node190 2025-07-28T20:22:18.707336 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node190->node192 leaf191 2025-07-28T20:22:38.541875 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node190->leaf191 leaf197 2025-07-28T20:22:38.889492 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node189 2025-07-28T20:22:19.249683 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node189->node190 node198 2025-07-28T20:22:23.609518 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node189->leaf197 node203 2025-07-28T20:22:19.473794 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf204 2025-07-28T20:22:39.044926 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node203->leaf204 leaf205 2025-07-28T20:22:39.117288 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node203->leaf205 leaf206 2025-07-28T20:22:39.192940 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node202 2025-07-28T20:22:19.695612 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node202->node203 node202->leaf206 node200 2025-07-28T20:22:19.907614 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node200->node202 node207 2025-07-28T20:22:20.835333 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf201 2025-07-28T20:22:38.966770 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node200->leaf201 node211 2025-07-28T20:22:20.148935 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf212 2025-07-28T20:22:39.350162 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node211->leaf212 leaf213 2025-07-28T20:22:39.430884 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node211->leaf213 node209 2025-07-28T20:22:20.374045 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node209->node211 leaf210 2025-07-28T20:22:39.271849 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node209->leaf210 leaf214 2025-07-28T20:22:39.504553 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node208 2025-07-28T20:22:20.583455 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node208->node209 node208->leaf214 leaf215 2025-07-28T20:22:39.584869 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node207->node208 node207->leaf215 node199 2025-07-28T20:22:21.103503 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node199->node200 node199->node207 node216 2025-07-28T20:22:23.372578 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node218 2025-07-28T20:22:21.326048 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node221 2025-07-28T20:22:22.198302 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf219 2025-07-28T20:22:39.663700 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node218->leaf219 leaf220 2025-07-28T20:22:39.740892 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node218->leaf220 node227 2025-07-28T20:22:21.538509 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf228 2025-07-28T20:22:40.035658 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node227->leaf228 leaf229 2025-07-28T20:22:40.117092 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node227->leaf229 node225 2025-07-28T20:22:21.754901 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node225->node227 leaf226 2025-07-28T20:22:39.959391 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node225->leaf226 node223 2025-07-28T20:22:21.973677 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node223->node225 leaf224 2025-07-28T20:22:39.884416 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node223->leaf224 node221->node223 leaf222 2025-07-28T20:22:39.814814 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node221->leaf222 node217 2025-07-28T20:22:22.442651 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node217->node218 node217->node221 node230 2025-07-28T20:22:23.154525 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node233 2025-07-28T20:22:22.685500 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf234 2025-07-28T20:22:40.267790 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node233->leaf234 leaf235 2025-07-28T20:22:40.347152 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node233->leaf235 node231 2025-07-28T20:22:22.925004 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node231->node233 leaf232 2025-07-28T20:22:40.193776 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node231->leaf232 leaf236 2025-07-28T20:22:40.421268 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node230->node231 node230->leaf236 node216->node217 node216->node230 node198->node199 node198->node216 node188 2025-07-28T20:22:23.849110 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node188->node189 node188->node198 leaf237 2025-07-28T20:22:40.502985 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node187->node188 node187->leaf237 node181 2025-07-28T20:22:24.323447 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node181->node182 node181->node187 node238 2025-07-28T20:22:29.231863 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node242 2025-07-28T20:22:24.556672 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf243 2025-07-28T20:22:40.642334 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node242->leaf243 leaf244 2025-07-28T20:22:40.736019 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node242->leaf244 node240 2025-07-28T20:22:24.795057 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node240->node242 leaf241 2025-07-28T20:22:40.569659 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node240->leaf241 leaf245 2025-07-28T20:22:40.807812 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node239 2025-07-28T20:22:25.034008 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node239->node240 node246 2025-07-28T20:22:28.937403 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node239->leaf245 node251 2025-07-28T20:22:25.271310 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf252 2025-07-28T20:22:41.040915 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node251->leaf252 leaf253 2025-07-28T20:22:41.122573 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node251->leaf253 node249 2025-07-28T20:22:25.506537 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node249->node251 leaf250 2025-07-28T20:22:40.960975 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node249->leaf250 node247 2025-07-28T20:22:25.756814 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node247->node249 node254 2025-07-28T20:22:28.637074 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf248 2025-07-28T20:22:40.887744 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node247->leaf248 node257 2025-07-28T20:22:25.976933 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf258 2025-07-28T20:22:41.199362 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node257->leaf258 leaf259 2025-07-28T20:22:41.272902 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node257->leaf259 leaf260 2025-07-28T20:22:41.342072 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node256 2025-07-28T20:22:26.212605 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node256->node257 node256->leaf260 leaf261 2025-07-28T20:22:41.417555 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node255 2025-07-28T20:22:26.452482 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node255->node256 node262 2025-07-28T20:22:28.362032 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node255->leaf261 node266 2025-07-28T20:22:26.757337 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ leaf267 2025-07-28T20:22:41.569047 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node266->leaf267 leaf268 2025-07-28T20:22:41.650059 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node266->leaf268 leaf269 2025-07-28T20:22:41.723055 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node265 2025-07-28T20:22:27.498340 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node265->node266 node270 2025-07-28T20:22:27.781498 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node265->leaf269 leaf271 2025-07-28T20:22:41.801436 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node270->leaf271 leaf272 2025-07-28T20:22:41.881161 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node270->leaf272 node264 2025-07-28T20:22:28.059664 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node264->node265 node264->node270 node262->node264 leaf263 2025-07-28T20:22:41.491386 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node262->leaf263 node254->node255 node254->node262 node246->node247 node246->node254 node238->node239 node238->node246 node180->node181 node180->node238 node172->node173 node172->node180 node162->node163 node162->node172 node0 2025-07-28T20:22:30.419394 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/ node0->node1  ≤ node0->node162  > legend 2025-07-28T20:21:54.064899 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/

Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?

In [36]:
predictions = clf.predict(X_test)
test_error_rate = 1 - accuracy_score(y_test, predictions)
print(test_error_rate) #test error rate
0.2928348909657321
In [37]:
confusion_matrix(y_test, predictions)
Out[37]:
array([[148,  45],
       [ 49,  79]])

Apply the GridSearchCV to the training set in order to determine the optimal tree size. Use the following parameter grid:

tree_param = {'criterion':['gini','entropy'],'max_depth':[4,5,6,7,8,9,10,11,12,15,20,30,40,50], 
              'min_samples_split': [2,3,4,5,6], 'min_samples_leaf' : [2,3,4,5,6]}
In [38]:
from sklearn.model_selection import GridSearchCV
In [39]:
tree_param = {'criterion':['gini','entropy'],'max_depth':[4,5,6,7,8,9,10,11,12,15,20,30,40,50], 
              'min_samples_split': [2,3,4,5,6], 'min_samples_leaf' : [2,3,4,5,6]}
In [40]:
grid_search_cv = GridSearchCV(DecisionTreeClassifier(random_state=42), tree_param, verbose=3, cv=5)
In [41]:
import sys
import contextlib

with contextlib.redirect_stdout(None): #limiting verbosity
    grid_search_cv.fit(X_train, y_train)
In [42]:
clf = grid_search_cv.best_estimator_
In [43]:
grid_search_cv.best_params_
Out[43]:
{'criterion': 'gini',
 'max_depth': 5,
 'min_samples_leaf': 3,
 'min_samples_split': 2}
In [44]:
clf.fit(X_train, y_train)
Out[44]:
DecisionTreeClassifier(max_depth=5, min_samples_leaf=3, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=5, min_samples_leaf=3, random_state=42)
In [45]:
clf.tree_.node_count
#optimal tree size is 51
Out[45]:
51
In [46]:
predictions = clf.predict(X_test)
test_error_rate = 1 - accuracy_score(y_test, predictions)
print(test_error_rate) #test error rate
0.2461059190031153
In [47]:
predictions_train = clf.predict(X_train)
train_error_rate = 1 - accuracy_score(y_train, predictions_train)
print(train_error_rate) #train error rate
0.12416555407209617

Plot the tree size on the x-axis and cross-validated classification error rate on the y-axis. Which tree size corresponds to the lowest cross-validated classification error rate?

In [48]:
tree_cv = pd.DataFrame(grid_search_cv.cv_results_)
In [49]:
tree_cv['tree_size'] = ''
tree_cv['error_rate'] = ''
In [50]:
tree_cv.head()
Out[50]:
mean_fit_time std_fit_time mean_score_time std_score_time param_criterion param_max_depth param_min_samples_leaf param_min_samples_split params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score tree_size error_rate
0 0.012579 0.011842 0.015275 0.021898 gini 4 2 2 {'criterion': 'gini', 'max_depth': 4, 'min_sam... 0.793333 0.8 0.84 0.813333 0.845638 0.818461 0.020981 44
1 0.009833 0.001676 0.007060 0.002132 gini 4 2 3 {'criterion': 'gini', 'max_depth': 4, 'min_sam... 0.793333 0.8 0.84 0.813333 0.845638 0.818461 0.020981 44
2 0.013317 0.007152 0.006985 0.005544 gini 4 2 4 {'criterion': 'gini', 'max_depth': 4, 'min_sam... 0.793333 0.8 0.84 0.813333 0.845638 0.818461 0.020981 44
3 0.008314 0.002541 0.006488 0.005324 gini 4 2 5 {'criterion': 'gini', 'max_depth': 4, 'min_sam... 0.793333 0.8 0.84 0.813333 0.845638 0.818461 0.020981 44
4 0.009239 0.001614 0.005794 0.000798 gini 4 2 6 {'criterion': 'gini', 'max_depth': 4, 'min_sam... 0.793333 0.8 0.84 0.813333 0.845638 0.818461 0.020981 44
In [51]:
i = 0
while i < len(tree_cv):
    clf = DecisionTreeClassifier(criterion = tree_cv.loc[i, 'param_criterion'], max_depth= tree_cv.loc[i, 'param_max_depth'], 
                                 min_samples_leaf= tree_cv.loc[i, 'param_min_samples_leaf'], 
                                 min_samples_split = tree_cv.loc[i, 'param_min_samples_split'], random_state=42)
    clf.fit(X_train, y_train)
    tree_cv.loc[i, 'tree_size'] = clf.tree_.node_count
    predictions = clf.predict(X_test)
    tree_cv.loc[i, 'error_rate'] = 1 - accuracy_score(y_test, predictions)
    i += 1
In [52]:
#To draw the graph, we need to group by tree size. The minimum classification error will be used
tree_cv_grouped = tree_cv.groupby(['tree_size']).agg({'error_rate':'min'})
tree_cv_grouped = tree_cv_grouped.reset_index()
In [53]:
tree_cv_grouped
Out[53]:
tree_size error_rate
0 25 0.255452
1 27 0.199377
2 29 0.190031
3 43 0.205607
4 45 0.196262
... ... ...
65 217 0.255452
66 219 0.277259
67 221 0.277259
68 231 0.242991
69 233 0.242991

70 rows × 2 columns

In [54]:
import matplotlib.pyplot as plt
In [55]:
plt.figure(figsize=(10,  6))
plt.grid()
plt.plot(tree_cv_grouped['tree_size'], tree_cv_grouped['error_rate'])
plt.xlabel("Tree Size")
plt.ylabel("Classification Error Rate")
Out[55]:
Text(0, 0.5, 'Classification Error Rate')

The lowest error rate has tree size of approximately 20, according to this graph.

Produce a pruned tree corresponding to The optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

In [56]:
clf = grid_search_cv.best_estimator_
In [57]:
path = clf.cost_complexity_pruning_path(X_train, y_train)
path
Out[57]:
{'ccp_alphas': array([0.        , 0.00037887, 0.00057755, 0.00093458, 0.0012016 ,
        0.00144637, 0.00233038, 0.00238924, 0.00247578, 0.00304979,
        0.00330043, 0.00379766, 0.00406541, 0.00480764, 0.00505792,
        0.00516478, 0.00530909, 0.01439439, 0.02019872, 0.02223703,
        0.17499343]),
 'impurities': array([0.17740075, 0.17777962, 0.17835717, 0.17929175, 0.18049335,
        0.1833861 , 0.18571648, 0.18810572, 0.19058149, 0.19668108,
        0.19998151, 0.20377916, 0.21190999, 0.22152526, 0.23164111,
        0.23680589, 0.24211498, 0.25650937, 0.27670809, 0.29894511,
        0.47393855])}
In [58]:
ccp_alphas, impurities = path.ccp_alphas, path.impurities
In [59]:
clfs = []

for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(random_state=42, ccp_alpha=ccp_alpha)
    clf.fit(X_train, y_train)
    clfs.append(clf)
In [60]:
from sklearn.metrics import accuracy_score

acc_scores = [accuracy_score(y_test, clf.predict(X_test)) for clf in clfs]

tree_depths = [clf.tree_.max_depth for clf in clfs]
plt.figure(figsize=(10,  6))
plt.grid()
plt.plot(ccp_alphas[:-1], acc_scores[:-1])
plt.xlabel("effective alpha")
plt.ylabel("Accuracy scores")
Out[60]:
Text(0, 0.5, 'Accuracy scores')

According to this graph, the highest accuracy has alpha between 0.005 and 0.015. I will use alpha of 0.01 which is in between.

In [61]:
tree = DecisionTreeClassifier(criterion= 'gini', max_depth=5, min_samples_leaf=3, 
                              min_samples_split = 2, random_state=42, ccp_alpha = 0.01)
In [62]:
tree.fit(X_train, y_train)
Out[62]:
DecisionTreeClassifier(ccp_alpha=0.01, max_depth=5, min_samples_leaf=3,
                       random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=0.01, max_depth=5, min_samples_leaf=3,
                       random_state=42)
In [63]:
predictions = tree.predict(X_test)
test_error_rate = 1 - accuracy_score(y_test, predictions)
print(test_error_rate) 
0.19937694704049846
In [64]:
predictions_train = tree.predict(X_train)
train_error_rate = 1 - accuracy_score(y_train, predictions_train)
print(train_error_rate)
0.157543391188251

Compare the training error rates and test error rates between the pruned and unpruned trees. Which is higher?

For the test error rate, unpruned tree was 0.2461059190031153 and pruned tree was 0.19937694704049846. Unpruned tree's error rate was higher so pruning the tree improved accuracy for test data

For the training error rate, unpruned tree was 0.12416555407209 and pruned tree was 0.157543391188251. Pruned tree error rate was higher so pruning the tree doensn't improved accuracy for training data

This is showing that pruning the tree has less overfitting on training data hence it increases the accuracy on test data.

In [ ]: